Waller County
Multi-task GINN-LP for Multi-target Symbolic Regression
Rajabu, Hussein, Qian, Lijun, Dong, Xishuang
In the area of explainable artificial intelligence, Symbolic Regression (SR) has emerged as a promising approach by discovering interpretable mathematical expressions that fit data. However, SR faces two main challenges: most methods are evaluated on scientific datasets with well-understood relationships, limiting generalization, and SR primarily targets single-output regression, whereas many real-world problems involve multi-target outputs with interdependent variables. To address these issues, we propose multi-task regression GINN-LP (MTRGINN-LP), an interpretable neural network for multi-target symbolic regression. By integrating GINN-LP with a multi-task deep learning, the model combines a shared backbone including multiple power-term approximator blocks with task-specific output layers, capturing inter-target dependencies while preserving interpretability. We validate multi-task GINN-LP on practical multi-target applications, including energy efficiency prediction and sustainable agriculture. Experimental results demonstrate competitive predictive performance alongside high interpretability, effectively extending symbolic regression to broader real-world multi-output tasks.
- Asia > China > Heilongjiang Province > Harbin (0.05)
- North America > United States > Texas > Waller County > Prairie View (0.05)
- Europe > Finland (0.04)
- (2 more...)
Transmitter Identification and Protocol Categorization in Shared Spectrum via Multi-Task RF Classification at the Network Edge
Abdul-Quddoos, Tariq, Sharmin, Tasnia, Li, Xiangfang, Qian, Lijun
Abstract--As spectrum sharing becomes increasingly vital to meet rising wireless demands in the future, spectrum monitoring and transmitter identification are indispensable for enforcing spectrum usage policy, efficient spectrum utilization, and network security. This study proposed a robust framework for transmitter identification and protocol categorization via multi-task RF signal classification in shared spectrum environments, where the spectrum monitor will classify transmission protocols (e.g., 4G L TE, 5G-NR, IEEE 802.11a) operating within the same frequency bands, and identify different transmitting base stations, as well as their combinations. A Convolutional Neural Network (CNN) is designed to tackle critical challenges such as overlapping signal characteristics and environmental variability. The proposed method employs a multi-channel input strategy to extract meaningful signal features, achieving remarkable accuracy: 90% for protocol classification, 100% for transmitting base station classification, and 92% for joint classification tasks, utilizing RF data from the POWDER platform. These results highlight the significant potential of the proposed method to enhance spectrum monitoring, management, and security in modern wireless networks.
- North America > United States > Utah (0.04)
- North America > United States > Texas > Waller County > Prairie View (0.04)
- Telecommunications (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
- Information Technology > Networks (0.67)
Systematic Comparative Analysis of Large Pretrained Language Models on Contextualized Medication Event Extraction
Abdul-Quddoos, Tariq, Dong, Xishuang, Qian, Lijun
Attention-based models have become the leading approach in modeling medical language for Natural Language Processing (NLP) in clinical notes. These models outperform traditional techniques by effectively capturing contextual representations of language. In this research a comparative analysis is done amongst pre-trained attention based models namely Bert Base, BioBert, two variations of Bio+Clinical Bert, RoBerta, and Clinical Longformer on task related to Electronic Health Record (EHR) information extraction. The tasks from Track 1 of Harvard Medical School's 2022 National Clinical NLP Challenges (n2c2) are considered for this comparison, with the Contextualized Medication Event Dataset (CMED) given for these task. CMED is a dataset of unstructured EHRs and annotated notes that contain task relevant information about the EHRs. The goal of the challenge is to develop effective solutions for extracting contextual information related to patient medication events from EHRs using data driven methods. Each pre-trained model is fine-tuned and applied on CMED to perform medication extraction, medical event detection, and multi-dimensional medication event context classification. Processing methods are also detailed for breaking down EHRs for compatibility with the applied models. Performance analysis has been carried out using a script based on constructing medical terms from the evaluation portion of CMED with metrics including recall, precision, and F1-Score. The results demonstrate that models pre-trained on clinical data are more effective in detecting medication and medication events, but Bert Base, pre-trained on general domain data showed to be the most effective for classifying the context of events related to medications.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Texas > Waller County > Prairie View (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Asia (0.04)
- Research Report > Experimental Study (0.49)
- Research Report > New Finding (0.48)
- Health & Medicine > Health Care Technology > Medical Record (0.92)
- Government > Regional Government (0.69)
Enhancing Learning Path Recommendation via Multi-task Learning
Nasrin, Afsana, Qian, Lijun, Obiomon, Pamela, Dong, Xishuang
Personalized learning is a student-centered educational approach that adapts content, pace, and assessment to meet each learner's unique needs. As the key technique to implement the personalized learning, learning path recommendation sequentially recommends personalized learning items such as lectures and exercises. Advances in deep learning, particularly deep reinforcement learning, have made modeling such recommendations more practical and effective. This paper proposes a multi-task LSTM model that enhances learning path recommendation by leveraging shared information across tasks. The approach reframes learning path recommendation as a sequence-to-sequence (Seq2Seq) prediction problem, generating personalized learning paths from a learner's historical interactions. The model uses a shared LSTM layer to capture common features for both learning path recommendation and deep knowledge tracing, along with task-specific LSTM layers for each objective. To avoid redundant recommendations, a non-repeat loss penalizes repeated items within the recommended learning path. Experiments on the ASSIST09 dataset show that the proposed model significantly outperforms baseline methods for the learning path recommendation.
- North America > United States > Texas > Waller County > Prairie View (0.04)
- Asia (0.04)
Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)
Sarker, Shouvon, Dong, Xishuang, Qian, Lijun
Identification of key variables such as medications, diseases, relations from health records and clinical notes has a wide range of applications in the clinical domain. n2c2 2022 provided shared tasks on challenges in natural language processing for clinical data analytics on electronic health records (EHR), where it built a comprehensive annotated clinical data Contextualized Medication Event Dataset (CMED). This study focuses on subtask 2 in Track 1 of this challenge that is to detect and classify medication events from clinical notes through building a novel BERT-based ensemble model. It started with pretraining BERT models on different types of big data such as Wikipedia and MIMIC. Afterwards, these pretrained BERT models were fine-tuned on CMED training data. These fine-tuned BERT models were employed to accomplish medication event classification on CMED testing data with multiple predictions. These multiple predictions generated by these fine-tuned BERT models were integrated to build final prediction with voting strategies. Experimental results demonstrated that BERT-based ensemble models can effectively improve strict Micro-F score by about 5% and strict Macro-F score by about 6%, respectively.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Texas > Waller County > Prairie View (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
Robotic Multimodal Data Acquisition for In-Field Deep Learning Estimation of Cover Crop Biomass
Johnson, Joe, Chalasani, Phanender, Shah, Arnav, Ray, Ram L., Bagavathiannan, Muthukumar
Accurate weed management is essential for mitigating significant crop yield losses, necessitating effective weed suppression strategies in agricultural systems. Integrating cover crops (CC) offers multiple benefits, including soil erosion reduction, weed suppression, decreased nitrogen requirements, and enhanced carbon sequestration, all of which are closely tied to the aboveground biomass (AGB) they produce. However, biomass production varies significantly due to microsite variability, making accurate estimation and mapping essential for identifying zones of poor weed suppression and optimizing targeted management strategies. To address this challenge, developing a comprehensive CC map, including its AGB distribution, will enable informed decision-making regarding weed control methods and optimal application rates. Manual visual inspection is impractical and labor-intensive, especially given the extensive field size and the wide diversity and variation of weed species and sizes. In this context, optical imagery and Light Detection and Ranging (LiDAR) data are two prominent sources with unique characteristics that enhance AGB estimation. This study introduces a ground robot-mounted multimodal sensor system designed for agricultural field mapping. The system integrates optical and LiDAR data, leveraging machine learning (ML) methods for data fusion to improve biomass predictions. The best ML-based model for dry AGB estimation achieved a coefficient of determination value of 0.88, demonstrating robust performance in diverse field conditions. This approach offers valuable insights for site-specific management, enabling precise weed suppression strategies and promoting sustainable farming practices.
- North America > United States > Texas > Brazos County > College Station (0.16)
- North America > United States > Texas > Waller County > Prairie View (0.04)
- North America > United States > Delaware > New Castle County > Wilmington (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Knowledge Acquisition on Mass-shooting Events via LLMs for AI-Driven Justice
Ihugba, Benign John, Nasrin, Afsana, Wu, Ling, Li, Lin, Qian, Lijun, Dong, Xishuang
Mass-shooting events pose a significant challenge to public safety, generating large volumes of unstructured textual data that hinder effective investigations and the formulation of public policy. Despite the urgency, few prior studies have effectively automated the extraction of key information from these events to support legal and investigative efforts. This paper presented the first dataset designed for knowledge acquisition on mass-shooting events through the application of named entity recognition (NER) techniques. It focuses on identifying key entities such as offenders, victims, locations, and criminal instruments, that are vital for legal and investigative purposes. The NER process is powered by Large Language Models (LLMs) using few-shot prompting, facilitating the efficient extraction and organization of critical information from diverse sources, including news articles, police reports, and social media. Experimental results on real-world mass-shooting corpora demonstrate that GPT-4o is the most effective model for mass-shooting NER, achieving the highest Micro Precision, Micro Recall, and Micro F1-scores. Meanwhile, o1-mini delivers competitive performance, making it a resource-efficient alternative for less complex NER tasks. It is also observed that increasing the shot count enhances the performance of all models, but the gains are more substantial for GPT-4o and o1-mini, highlighting their superior adaptability to few-shot learning scenarios.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- North America > United States > Texas > Waller County > Prairie View (0.04)
- (2 more...)
Data Augmentation via Diffusion Model to Enhance AI Fairness
Blow, Christina Hastings, Qian, Lijun, Gibson, Camille, Obiomon, Pamela, Dong, Xishuang
AI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision. This paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models-Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)-were used to validate the proposed approach. Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.88)
Enhancing LLM Fine-tuning for Text-to-SQLs by SQL Quality Measurement
Sarker, Shouvon, Dong, Xishuang, Li, Xiangfang, Qian, Lijun
Text-to-SQLs enables non-expert users to effortlessly retrieve desired information from relational databases using natural language queries. While recent advancements, particularly with Large Language Models (LLMs) like GPT and T5, have shown impressive performance on large-scale benchmarks such as BIRD, current state-of-the-art (SOTA) LLM-based Text-to-SQLs models often require significant efforts to develop auxiliary tools like SQL classifiers to achieve high performance. This paper proposed a novel approach that only needs SQL Quality Measurement to enhance LLMs-based Text-to-SQLs performance. It establishes a SQL quality evaluation mechanism to assess the generated SQL queries against predefined criteria and actual database responses. This feedback loop enables continuous learning and refinement of model outputs based on both syntactic correctness and semantic accuracy. The proposed method undergoes comprehensive validation on the BIRD benchmark, assessing Execution Accuracy (EX) and Valid Efficiency Score (VES) across various Text-to-SQLs difficulty levels. Experimental results reveal competitive performance in both EX and VES compared to SOTA models like GPT4 and T5.
- Research Report (1.00)
- Overview > Innovation (0.34)
Enhancing Deep Knowledge Tracing via Diffusion Models for Personalized Adaptive Learning
Kuo, Ming, Sarker, Shouvon, Qian, Lijun, Fu, Yujian, Li, Xiangfang, Dong, Xishuang
In contrast to pedagogies like evidence-based teaching, personalized adaptive learning (PAL) distinguishes itself by closely monitoring the progress of individual students and tailoring the learning path to their unique knowledge and requirements. A crucial technique for effective PAL implementation is knowledge tracing, which models students' evolving knowledge to predict their future performance. Based on these predictions, personalized recommendations for resources and learning paths can be made to meet individual needs. Recent advancements in deep learning have successfully enhanced knowledge tracking through Deep Knowledge Tracing (DKT). This paper introduces generative AI models to further enhance DKT. Generative AI models, rooted in deep learning, are trained to generate synthetic data, addressing data scarcity challenges in various applications across fields such as natural language processing (NLP) and computer vision (CV). This study aims to tackle data shortage issues in student learning records to enhance DKT performance for PAL. Specifically, it employs TabDDPM, a diffusion model, to generate synthetic educational records to augment training data for enhancing DKT. The proposed method's effectiveness is validated through extensive experiments on ASSISTments datasets. The experimental results demonstrate that the AI-generated data by TabDDPM significantly improves DKT performance, particularly in scenarios with small data for training and large data for testing.
- North America > United States > Texas > Waller County > Prairie View (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Alabama > Madison County > Madison (0.04)
- Asia > China (0.04)